AITopics | interspeech 2019

Collaborating Authors

interspeech 2019

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Does Audio Deepfake Detection Generalize?

Müller, Nicolas M., Czempin, Pavel, Dieckmann, Franziska, Froghyar, Adam, Böttinger, Konstantin

arXiv.org Artificial IntelligenceAug-27-2024

Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Which factors contribute to success, and which are accidental? In this work, we address this problem: We systematize audio spoofing detection by re-implementing and uniformly evaluating architectures from related work. We identify overarching features for successful audio deepfake detection, such as using cqtspec or logspec features instead of melspec features, which improves performance by 37% EER on average, all other factors constant. Additionally, we evaluate generalization capabilities: We collect and publish a new dataset consisting of 37.9 hours of found audio recordings of celebrities and politicians, of which 17.2 hours are deepfakes. We find that related work performs poorly on such real-world data (performance degradation of up to one thousand percent). This may suggest that the community has tailored its solutions too closely to the prevailing ASVSpoof benchmark and that deepfakes are much harder to detect outside the lab than previously thought.

architecture, dataset, detection, (13 more...)

arXiv.org Artificial Intelligence

2203.16263

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sensing of inspiration events from speech: comparison of deep learning and linguistic methods

Härmä, Aki, Grossekathöfer, Ulf, Ouweltjes, Okke, Nallanthighal, Venkata Srikanth

arXiv.org Artificial IntelligenceMay-19-2023

Respiratory chest belt sensor can be used to measure the respiratory rate and other respiratory health parameters. Virtual Respiratory Belt, VRB, algorithms estimate the belt sensor waveform from speech audio. In this paper we compare the detection of inspiration events (IE) from respiratory belt sensor data using a novel neural VRB algorithm and the detections based on time-aligned linguistic content. The results show the superiority of the VRB method over word pause detection or grammatical content segmentation. The comparison of the methods show that both read and spontaneous speech content has a significant amount of ungrammatical breathing, that is, breathing events that are not aligned with grammatically appropriate places in language. This study gives new insights into the development of VRB methods and adds to the general understanding of speech breathing behavior. Moreover, a new VRB method, VRBOLA, for the reconstruction of the continuous breathing waveform is demonstrated.

artificial intelligence, machine learning, speech, (18 more...)

arXiv.org Artificial Intelligence

2305.11683

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.91)

Add feedback

Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP

Antoniou, Nikolaos, Katsamanis, Athanasios, Giannakopoulos, Theodoros, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceApr-3-2023

There is an imminent need for guidelines and standard test sets to allow direct and fair comparisons of speech emotion recognition (SER). While resources, such as the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database, have emerged as widely-adopted reference corpora for researchers to develop and test models for SER, published work reveals a wide range of assumptions and variety in its use that challenge reproducibility and generalization. Based on a critical review of the latest advances in SER using IEMOCAP as the use case, our work aims at two contributions: First, using an analysis of the recent literature, including assumptions made and metrics used therein, we provide a set of SER evaluation guidelines. Second, using recent publications with open-sourced implementations, we focus on reproducibility assessment in SER.

artificial intelligence, evaluation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10096808

2304.0086

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

Roger, Vincent, Farinas, Jérôme, Pinquier, Julien

arXiv.org Machine LearningMar-9-2020

Most state-of-the-art speech systems are using Deep Neural Networks (DNNs). Those systems require a large amount of data to be learned. Hence, learning state-of-the-art frameworks on under-resourced speech languages/problems is a difficult task. Problems could be the limited amount of data for impaired speech. Furthermore, acquiring more data and/or expertise is time-consuming and expensive. In this paper we position ourselves for the following speech processing tasks: Automatic Speech Recognition, speaker identification and emotion recognition. To assess the problem of limited data, we firstly investigate state-of-the-art Automatic Speech Recognition systems as it represents the hardest tasks (due to the large variability in each language). Next, we provide an overview of techniques and tasks requiring fewer data. In the last section we investigate few-shot techniques as we interpret under-resourced speech as a few-shot problem. In that sense we propose an overview of few-shot techniques and perspectives of using such techniques for the focused speech problems in this survey. It occurs that the reviewed techniques are not well adapted for large datasets. Nevertheless, some promising results from the literature encourage the usage of such techniques for speech processing.

dataset, interspeech 2019, neural network, (14 more...)

arXiv.org Machine Learning

2003.04241

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
Asia > India (0.04)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Apple at Interspeech 2019 - Apple

#artificialintelligenceSep-18-2019, 08:10:25 GMT

Apple is attending Interspeech 2019, the world's largest conference on the science and technology of spoken language processing. The conference, of which Apple is a Platinum Sponsor, will take place in Graz, Austria from September 15th to 19th. For Interspeech attendees, join the authors of our accepted papers at our booth to learn more about the great speech research happening at Apple. Apple continues to build cutting-edge technology in the space of machine hearing, speech recognition, natural language processing, machine translation, text-to-speech, and artificial intelligence, improving the lives of millions of customers every day. If you're interested in opportunities to make an impact on Apple products through machine learning research and development, check out our teams at Jobs at Apple.

artificial intelligence, machine learning, natural language, (20 more...)

#artificialintelligence

Country: Europe > Austria > Styria > Graz (0.25)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.51)

Add feedback